Spoken Language Synthesis: Experiments in Synthesis of Spontaneous Monologues
نویسندگان
چکیده
While TTS technology has come a long way, there is an ongoing need for bringing improved “naturalness” to synthesized speech. One predominant aspect of natural, spontaneous speech is the variability in it along several dimensions -in terms of vocabulary, prosodic features, paralinguistic elements and discourse markers. Such variability is typically carefully avoided or minimized in conventional text to speech for the sake of high intelligibility. However, in applications requiring immersive anthropomorphic humanmachine interfaces, including those with computer-generated avatars, there is a great desire to mimic human-like synthesized speech output. In this paper we investigate methods and the usefulness of incorporating certain features characterizing fluent natural speech for increasing “naturalness” in synthesized speech. We propose a data driven approach for modeling both speaker-independent and speaker-dependent spontaneous speech features at the lexical and acoustic levels (so-called, VoiceFonts). This method has the potential to create unique, custom speaking styles of a target speaker. A simple limited domain synthesizer was built based on this idea using data from a classroom lecture and was used to synthesize 28 target utterances. Results from preliminary listening experiments by 19 volunteers showed that such an approach indeed improves naturalness, without significant loss in intelligibility, beyond the limitations of the underlying waveform synthesis. For example, subjects could correctly identify natural speech with a probability of 0.6 and confused the clips synthesized in this work with natural speech with a probability of 0.27 in a 4-way choice listening test.
منابع مشابه
Discourse Structure in Spoken Language: Studies on Speech Corpora
A better understanding of the intonational charaeteristics of spoken discourse may lead to new empirical techniques for identifying discourse structure from speech, as well as new algorithms for enhancing the naturalness of synthetic speech. This paper summarizes results of pilot studies that demonstrate reliable correlations of discourse and speech properties, and reports findings on a new cor...
متن کاملA generic algorithm for generating spoken monologues
The defining property of a Concept-to-Speech system is that it combines language and speech generation. Language generation converts the input concepts into natural language, which speech generation subsequently transforms into speech. Potentially, this leads to a more ‘natural sounding’ output than can be achieved in a plain Text-to-Speech system, since the correct placement of pitch accents a...
متن کاملVoiced/unvoiced transitions in speech as a potential bio-marker to detect parkinson's disease
Several studies have addressed the automatic classification of speakers with Parkinson’s disease (PD) and healthy controls (HC). Most of the studies are based on speech recordings of sustained vowels, isolated words, and single sentences. Only few investigations have considered read texts and/or spontaneous speech. This paper addresses two main questions still open regarding the automatic analy...
متن کاملParsing Spoken Language without Syntax : a Microsemantic Approach
Parsing spontaneous speech is a difficult task because of the ungrammatical nature of most spoken utterances. To overpass this problem, we propose in this paper to handle the spoken language without considering syntax. We describe thus a microsemantic parser which is uniquely based on an associative network of semantic priming. Experimental results on spontaneous speech show that this parser st...
متن کاملSome experiments in the Czech spontaneous speech recognition domain
A spoken/dialog interpretation system is proposed, using prosodic information systematically at all processing stages. A prosody modul is used for parsing, dialog understanding, translation, generation and speech synthesis. 1
متن کامل